1 This is a sidenote that was entered using a footnote.
Redundancy analysis is a constrained ordination method that can be used to determine the extent to which one set of variables (explanatory variables) explains the variation in another set of variables (response variables), i.e. it is the multivariate analog of a simple linear regression (Oksanen et al. 2010). It assumes that the relationship between the dependent (response) and independent (explanatory) variables is linear. Here, it can be applied to understand which environmental parameters of the grow-out pond significantly explain a component of the observed pattern of genetic variation of progeny (Forester et al. 2016, 2018).
A partial RDA enables examining the effect of a set of variables (explanatory variables, X) on a response matrix (Y) while controlling for a set of conditioning variables (Z) and can be applied in a case such as here where the strong effect of a set of well-characterized variables (parentage/family) might obscure the signal of a second set of variables (environmental conditions).
Thin SNPs so only one SNP per contig is retained to reduce linkage among markers.
# thin SNPs within 350 bp of each other
vcftools --vcf data/GENOME_SCAN/SOC.yoy.genotyped.recode.vcf --out data/GENOME_SCAN/SOC.yoy.thinned --thin 350 --recode --recode-INFO-all
Format response variables (genetic data) as allele counts per locus and individual, homozygous genotypes are coded as 0 or 2, heterozygotes as 1.
RDA requires a complete data set, therefore missing values are replaced with the mean allele frequency.
The response variables consist of 2023 loci genotyped for 828 progeny across three spawning events (three sample points per spawning event).
Parentage (dam + sire) of each offspring creates the dominant signal in the data set. Because categorical data cannot be used, parentage will be converted into “dummy variables”, i.e. we will create a matrix with each row corresponding to a single progeny and on column per potential parent. Parents will be coded as 1 non-parental adults as 0.
The spawning events took place in 2017 (N = 1) and 2018 (N = 2). Grow-out ponds exclude predators and ensure abundant prey and while physical parameters (temperature, dissolved oxygen, pH, and salinity) are monitored, they are not controlled with the exception of dissolved oxygen which is maintained by an active water paddle which can be manipulated.
Table 1: Number of individuals at different stages (incubator, stocking into grow-out ponds, harvest) and comparison of survival in incubator and grow-out pond.
| EVENT | EGGS_INCUBATED | FRY_HARVESTED | FRY_STOCKED | YOY_HARVESTED | PROP_SURV_INCUB | PROP_SURV_POND |
|---|---|---|---|---|---|---|
| YOY-1 | 1050000 | 625628 | 395254 | 111306 | 0.60 | 0.28 |
| YOY-2 | 690000 | 375560 | 375560 | 343508 | 0.54 | 0.91 |
| YOY-3 | 1510000 | 784554 | 428230 | 30099 | 0.52 | 0.07 |
The number of eggs placed in the incubator is largely consistent across spawning events 1 and 3 (approx. 1 Mill). By contrast, for spawning event 2 only approximate 690,000 were placed in the incubator; despite though the number of brood fish involved in the spawning event 2 being the highest (individuals from 4 tanks compared to 2 and 3).
Though fry were stocked into the grow-out ponds at similar levels (375,000 - 428,000) the proprtion of progeny that survived to be harvested for stocking varied greatly from 7% (spawning event 3), to 28% spawning event 1, and 91% (spawning event 2). By contrast, survival during the time in the incubator was largely consistent (52 - 60%).
Despite spawning event 2 having the lowest number of eggs incubated, due to the highest proportion of progeny surviving to the next stage both in the incubator and the grow-out pond it had the highest number of progeny harvested for stocking at 343,000 compared to 111,000 and 30,000 for spawning event 1 and 3, respectively. This could be due to differences in environmental conditions of the pond or due to the fact that this pond was harvested early due to construction.
Fig 1: Change in length [mm] of progreny over time spend in grow-out pond for each spawning event.
##
## linear regression growth YOY-1
##
## Call:
## lm(formula = LENGTH ~ DAY, data = tmp)
##
## Residuals:
## 1 2 3 4 5
## 1.0008 -0.4809 -0.8673 -0.4354 0.7828
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.55836 1.48335 4.421 0.0215 *
## DAY 0.29545 0.06571 4.496 0.0205 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9639 on 3 degrees of freedom
## Multiple R-squared: 0.8708, Adjusted R-squared: 0.8277
## F-statistic: 20.22 on 1 and 3 DF, p-value: 0.02054
## slope (growth) for spawning event 1: 0.3
##
## linear regression growth YOY-2
##
## Call:
## lm(formula = LENGTH ~ DAY, data = tmp)
##
## Residuals:
## 1 2 3 4 5 6 7
## -2.9313 0.3588 -0.2878 4.5290 0.4725 1.2160 -3.3573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.2183 3.1577 4.186 0.0086 **
## DAY 0.3366 0.1074 3.135 0.0258 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.908 on 5 degrees of freedom
## Multiple R-squared: 0.6629, Adjusted R-squared: 0.5954
## F-statistic: 9.831 on 1 and 5 DF, p-value: 0.0258
## slope (growth) for spawning event 2: 0.34
##
## linear regression growth YOY-3
##
## Call:
## lm(formula = LENGTH ~ DAY, data = tmp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9572 -0.9204 0.3286 0.9082 1.9897
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.11013 1.21904 5.833 0.000642 ***
## DAY 0.10409 0.03448 3.019 0.019410 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.48 on 7 degrees of freedom
## Multiple R-squared: 0.5656, Adjusted R-squared: 0.5036
## F-statistic: 9.115 on 1 and 7 DF, p-value: 0.01941
## slope (growth) for spawning event 3: 0.1
Linear regressions show a similar slope for spawning event 1 and 2 (0.3 and 0.34), while the slope for spawning event three is lower (0.1); indicating slowest growth. Generally, fish are expected to spend approximately 30 days in a pond and be about 3 cm when harvested, though growth rates may differ among ponds, especially due to differences in temperature.
Fish were harvested from grow-out pond 2 at a shorter size (approx 1.5 cm) than usually desired (2.5 - 3 cm) due to construction. Due to slow, growth fish from pond 3 were also harvested at a smaller age and spent the most amount of time in the grow-out ponds (> 50).
Fig 2a: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 1. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).
Pond culture for spawning event 3 is characterized by a decrease in temperature between sampling point T1 and T2 which then increases again to experience a second more steep (but not as drastic) decline.
Fig 2b: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 2. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).
Spawning event 2 has comparatively lower salinity overall and experiences a steep decline in temperature after sampling point T2 after which there is a slow, steady increase in temperature.
Fig 2c: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 3. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).
Spawning event 3 has similarly low levels of salinity compared to spawning event 2. This pond was stocked about a week after the pond used for spawning event 2, so it experiences the same steep decline in temperature follow by a gradualt increase but then experiences a second similarly steep but even more drastic drop in temperatures after which the temperatures remain low.
Fig 3: Correlation among physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond (example spawning event 1).
Morning, evening, and afternoon measurements of temperature are correlated. The same pattern is observed for dissolved oxygen.
Fig 4: Comparison of distribution of physical parameters in grow-out ponds of spawning events 1 (blue), 2 (green), and 3 (red)).
Not all sampled progeny experienced the entire range of values, i.e. they are removed at different time points. Calculate mean values for three time ranges (day progeny are stocked to time points 1 - 3).
Fig 5: Comparison of distribution of physical parameters in grow-out ponds of spawning events 1 (blue), 2 (green), and 3 (red)).
Test for significant differences among physical parameters of grow-out ponds for each time range per spawning event.
Temperature (morning)
##
## RESULTS ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## RANGE 8 1562 195.29 12.09 0.0000000000000185 ***
## Residuals 237 3829 16.16
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.
Table 2a: Significant pairwise comparisons of morning temperature using Tukey-Honest Significanct Differences test.
| RANGE1 | RANGE2 | diff | p adj |
|---|---|---|---|
| YOY-2_1 | YOY-1_2 | 4.819028 | 0.027 |
| YOY-2_1 | YOY-1_3 | 5.508352 | 0.001 |
| YOY-2_2 | YOY-1_2 | 4.542105 | 0.010 |
| YOY-2_2 | YOY-1_3 | 5.231429 | 0.000 |
| YOY-2_3 | YOY-2_2 | -3.470213 | 0.023 |
| YOY-3_1 | YOY-1_3 | 4.981429 | 0.008 |
| YOY-3_3 | YOY-2_1 | -7.535653 | 0.000 |
| YOY-3_3 | YOY-2_2 | -7.258730 | 0.000 |
| YOY-3_3 | YOY-2_3 | -3.788517 | 0.000 |
| YOY-3_3 | YOY-3_1 | -7.008730 | 0.000 |
| YOY-3_3 | YOY-3_2 | -4.583730 | 0.000 |
Temperature (afternoon)
##
## RESULTS ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## RANGE 8 1772 221.51 11.27 0.000000000000161 ***
## Residuals 239 4698 19.66
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.
Table 2b: Significant pairwise comparisons of afternoon temperature using Tukey-Honest Significanct Differences test.
| RANGE1 | RANGE2 | diff | p adj |
|---|---|---|---|
| YOY-2_1 | YOY-1_2 | 5.049615 | 0.041 |
| YOY-2_1 | YOY-1_3 | 5.404060 | 0.006 |
| YOY-2_2 | YOY-1_2 | 4.651956 | 0.020 |
| YOY-2_2 | YOY-1_3 | 5.006401 | 0.001 |
| YOY-2_3 | YOY-2_2 | -3.582701 | 0.044 |
| YOY-3_1 | YOY-1_3 | 5.152778 | 0.017 |
| YOY-3_3 | YOY-2_1 | -8.171917 | 0.000 |
| YOY-3_3 | YOY-2_2 | -7.774258 | 0.000 |
| YOY-3_3 | YOY-2_3 | -4.191557 | 0.000 |
| YOY-3_3 | YOY-3_1 | -7.920635 | 0.000 |
| YOY-3_3 | YOY-3_2 | -4.087302 | 0.005 |
Temperature (evening)
##
## RESULTS ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## RANGE 8 1616 201.98 11.65 0.0000000000000578 ***
## Residuals 239 4144 17.34
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.
Table 2c: Significant pairwise comparisons of evening temperature using Tukey-Honest Significanct Differences test.
| RANGE1 | RANGE2 | diff | p adj |
|---|---|---|---|
| YOY-2_1 | YOY-1_2 | 5.094 | 0.020 |
| YOY-2_1 | YOY-1_3 | 5.653 | 0.001 |
| YOY-2_2 | YOY-1_2 | 4.299 | 0.024 |
| YOY-2_2 | YOY-1_3 | 4.857 | 0.001 |
| YOY-2_3 | YOY-2_1 | -4.220 | 0.037 |
| YOY-2_3 | YOY-2_2 | -3.425 | 0.037 |
| YOY-3_1 | YOY-1_3 | 5.242 | 0.006 |
| YOY-3_3 | YOY-2_1 | -7.950 | 0.000 |
| YOY-3_3 | YOY-2_2 | -7.155 | 0.000 |
| YOY-3_3 | YOY-2_3 | -3.730 | 0.000 |
| YOY-3_3 | YOY-3_1 | -7.539 | 0.000 |
| YOY-3_3 | YOY-3_2 | -4.564 | 0.000 |
Dissolved Oxygen (morning)
##
## RESULTS ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## RANGE 8 40.1 5.017 3.118 0.00227 **
## Residuals 239 384.5 1.609
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.
Table 2d: Significant pairwise comparisons of dissolved oxygen in the morning using Tukey-Honest Significanct Differences test.
| RANGE1 | RANGE2 | diff | p adj |
|---|---|---|---|
| YOY-3_3 | YOY-1_3 | 0.974 | 0.009 |
| YOY-3_3 | YOY-2_2 | 1.101 | 0.013 |
Dissolved Oxygen (afternoon)
##
## RESULTS ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## RANGE 8 60.0 7.503 4.524 0.0000387 ***
## Residuals 239 396.4 1.658
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.
Table 2e: Significant pairwise comparisons of dissolved oxygen in the afternoon using Tukey-Honest Significanct Differences test.
| RANGE1 | RANGE2 | diff | p adj |
|---|---|---|---|
| YOY-2_1 | YOY-1_1 | -2.092 | 0.005 |
| YOY-2_1 | YOY-1_2 | -1.587 | 0.018 |
| YOY-2_2 | YOY-1_1 | -1.648 | 0.024 |
| YOY-3_3 | YOY-2_1 | 1.638 | 0.001 |
| YOY-3_3 | YOY-2_2 | 1.194 | 0.006 |
Dissolved Oxygen (evening)
##
## RESULTS ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## RANGE 8 36.0 4.494 2.284 0.0226 *
## Residuals 239 470.3 1.968
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.
Table 2f: Significant pairwise comparisons of dissolved oxygen in the evening using Tukey-Honest Significanct Differences test.
| RANGE1 | RANGE2 | diff | p adj |
|---|---|---|---|
| YOY-3_3 | YOY-1_3 | 0.981 | 0.026 |
Salinity
##
## RESULTS ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## RANGE 8 13846 1730.7 2211 <0.0000000000000002 ***
## Residuals 239 187 0.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.
Table 2g: Significant pairwise comparisons of salinity using Tukey-Honest Significanct Differences test.
| RANGE1 | RANGE2 | diff | p adj |
|---|---|---|---|
| YOY-1_3 | YOY-1_1 | 1.644 | 0 |
| YOY-1_3 | YOY-1_2 | 1.294 | 0 |
| YOY-2_1 | YOY-1_1 | -17.377 | 0 |
| YOY-2_1 | YOY-1_2 | -17.727 | 0 |
| YOY-2_1 | YOY-1_3 | -19.021 | 0 |
| YOY-2_2 | YOY-1_1 | -17.604 | 0 |
| YOY-2_2 | YOY-1_2 | -17.954 | 0 |
| YOY-2_2 | YOY-1_3 | -19.249 | 0 |
| YOY-2_3 | YOY-1_1 | -17.343 | 0 |
| YOY-2_3 | YOY-1_2 | -17.693 | 0 |
| YOY-2_3 | YOY-1_3 | -18.987 | 0 |
| YOY-3_1 | YOY-1_1 | -14.550 | 0 |
| YOY-3_1 | YOY-1_2 | -14.900 | 0 |
| YOY-3_1 | YOY-1_3 | -16.194 | 0 |
| YOY-3_1 | YOY-2_1 | 2.827 | 0 |
| YOY-3_1 | YOY-2_2 | 3.054 | 0 |
| YOY-3_1 | YOY-2_3 | 2.793 | 0 |
| YOY-3_2 | YOY-1_1 | -14.258 | 0 |
| YOY-3_2 | YOY-1_2 | -14.608 | 0 |
| YOY-3_2 | YOY-1_3 | -15.903 | 0 |
| YOY-3_2 | YOY-2_1 | 3.119 | 0 |
| YOY-3_2 | YOY-2_2 | 3.346 | 0 |
| YOY-3_2 | YOY-2_3 | 3.084 | 0 |
| YOY-3_3 | YOY-1_1 | -13.760 | 0 |
| YOY-3_3 | YOY-1_2 | -14.110 | 0 |
| YOY-3_3 | YOY-1_3 | -15.405 | 0 |
| YOY-3_3 | YOY-2_1 | 3.617 | 0 |
| YOY-3_3 | YOY-2_2 | 3.844 | 0 |
| YOY-3_3 | YOY-2_3 | 3.582 | 0 |
pH
##
## RESULTS ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## RANGE 8 3.880 0.4850 19.52 <0.0000000000000002 ***
## Residuals 239 5.939 0.0249
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.
Table 2h: Significant pairwise comparisons of PH using Tukey-Honest Significanct Differences test.
| RANGE1 | RANGE2 | diff | p adj |
|---|---|---|---|
| YOY-2_3 | YOY-1_1 | -0.187 | 0.022 |
| YOY-2_3 | YOY-1_2 | -0.242 | 0.000 |
| YOY-2_3 | YOY-1_3 | -0.149 | 0.001 |
| YOY-3_1 | YOY-1_2 | -0.182 | 0.046 |
| YOY-3_2 | YOY-1_1 | -0.218 | 0.009 |
| YOY-3_2 | YOY-1_2 | -0.273 | 0.000 |
| YOY-3_2 | YOY-1_3 | -0.181 | 0.001 |
| YOY-3_3 | YOY-1_1 | -0.332 | 0.000 |
| YOY-3_3 | YOY-1_2 | -0.387 | 0.000 |
| YOY-3_3 | YOY-1_3 | -0.294 | 0.000 |
| YOY-3_3 | YOY-2_1 | -0.222 | 0.000 |
| YOY-3_3 | YOY-2_2 | -0.248 | 0.000 |
| YOY-3_3 | YOY-2_3 | -0.146 | 0.000 |
| YOY-3_3 | YOY-3_1 | -0.206 | 0.002 |
Calculate mean, median, 5th and 95th percentile for salinity, pH, temperature, and dissolved oxygen measurements as potential constraining (explanatory) variables for RDA.
Run step-wise selection of variables using p(in)-value < 0.15 and p(out)-value > 0.1; run forward selection of variable using R2 value and Pin < 0.1 to identify best model(s).
##
##
## STEPWISE SELECTION OF VARIABLES (P(in) = 0.15/P(out) = 0.1
##
## Start: allelecount ~ 1
##
## Df AIC F Pr(>F)
## + SALINITY_qnt_5 1 4828.7 42.6718 0.005 **
## + SALINITY_MEAN 1 4829.1 42.2160 0.005 **
## + SALINITY_qnt_95 1 4829.2 42.1228 0.005 **
## + SALINITY_qnt_50 1 4829.5 41.7960 0.005 **
## + PH_qnt_95 1 4830.2 41.0988 0.005 **
## + TEMP_AFTERN_qnt_95 1 4830.3 41.0247 0.005 **
## + PH_MEAN 1 4833.4 37.7687 0.005 **
## + PH_qnt_50 1 4840.3 30.5942 0.005 **
## + PH_qnt_5 1 4845.5 25.2515 0.005 **
## + DO_EVEN_MEAN 1 4855.3 15.1787 0.005 **
## + TEMP_AFTERN_qnt_50 1 4859.1 11.3590 0.005 **
## + TEMP_AFTERN_MEAN 1 4863.2 7.2169 0.005 **
## + TEMP_AFTERN_qnt_5 1 4865.8 4.6397 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5
##
## Df AIC F Pr(>F)
## - SALINITY_qnt_5 1 4868.4 42.672 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + PH_qnt_95 1 4814.9 15.9024 0.005 **
## + DO_EVEN_MEAN 1 4820.4 10.3716 0.005 **
## + PH_MEAN 1 4820.6 10.0927 0.005 **
## + PH_qnt_5 1 4823.3 7.4311 0.005 **
## + PH_qnt_50 1 4824.0 6.7116 0.005 **
## + TEMP_AFTERN_MEAN 1 4825.4 5.2898 0.005 **
## + TEMP_AFTERN_qnt_5 1 4825.5 5.1930 0.005 **
## + SALINITY_qnt_50 1 4825.7 5.0437 0.005 **
## + TEMP_AFTERN_qnt_95 1 4826.7 3.9652 0.005 **
## + TEMP_AFTERN_qnt_50 1 4826.9 3.7930 0.005 **
## + SALINITY_MEAN 1 4828.3 2.4122 0.005 **
## + SALINITY_qnt_95 1 4828.7 1.9813 0.010 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95
##
## Df AIC F Pr(>F)
## - PH_qnt_95 1 4828.7 15.902 0.005 **
## - SALINITY_qnt_5 1 4830.2 17.428 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + SALINITY_qnt_95 1 4812.6 4.2985 0.005 **
## + TEMP_AFTERN_MEAN 1 4813.4 3.4734 0.005 **
## + SALINITY_MEAN 1 4813.4 3.4320 0.005 **
## + PH_qnt_5 1 4813.5 3.3943 0.005 **
## + TEMP_AFTERN_qnt_5 1 4813.6 3.3283 0.005 **
## + TEMP_AFTERN_qnt_50 1 4813.9 2.9706 0.005 **
## + DO_EVEN_MEAN 1 4814.0 2.8568 0.005 **
## + PH_MEAN 1 4814.1 2.7575 0.005 **
## + SALINITY_qnt_50 1 4814.1 2.7505 0.005 **
## + TEMP_AFTERN_qnt_95 1 4814.3 2.5888 0.005 **
## + PH_qnt_50 1 4814.5 2.4242 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95
##
## Df AIC F Pr(>F)
## - SALINITY_qnt_95 1 4814.9 4.2985 0.005 **
## - SALINITY_qnt_5 1 4815.3 4.7336 0.005 **
## - PH_qnt_95 1 4828.7 18.2418 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + TEMP_AFTERN_MEAN 1 4810.1 4.4276 0.005 **
## + TEMP_AFTERN_qnt_5 1 4810.6 3.9371 0.005 **
## + PH_qnt_50 1 4810.8 3.7533 0.005 **
## + PH_MEAN 1 4811.1 3.4254 0.005 **
## + TEMP_AFTERN_qnt_50 1 4811.1 3.4189 0.005 **
## + PH_qnt_5 1 4811.3 3.2216 0.005 **
## + TEMP_AFTERN_qnt_95 1 4811.4 3.1220 0.005 **
## + DO_EVEN_MEAN 1 4811.6 2.9571 0.005 **
## + SALINITY_MEAN 1 4811.9 2.6432 0.005 **
## + SALINITY_qnt_50 1 4813.2 1.4176 0.060 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN
##
## Df AIC F Pr(>F)
## - TEMP_AFTERN_MEAN 1 4812.6 4.4276 0.005 **
## - SALINITY_qnt_95 1 4813.4 5.2526 0.005 **
## - SALINITY_qnt_5 1 4813.7 5.5064 0.005 **
## - PH_qnt_95 1 4825.3 17.2865 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + DO_EVEN_MEAN 1 4805.2 6.8815 0.005 **
## + PH_qnt_5 1 4808.0 4.1348 0.005 **
## + SALINITY_qnt_50 1 4809.4 2.6836 0.005 **
## + PH_MEAN 1 4810.7 1.4787 0.030 *
## + PH_qnt_50 1 4810.8 1.3144 0.045 *
## + SALINITY_MEAN 1 4810.9 1.2001 0.140
## + TEMP_AFTERN_qnt_5 1 4811.0 1.1569 0.175
## + TEMP_AFTERN_qnt_50 1 4811.1 1.0260 0.380
## + TEMP_AFTERN_qnt_95 1 4811.1 0.9992 0.445
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN + DO_EVEN_MEAN
##
## Df AIC F Pr(>F)
## - DO_EVEN_MEAN 1 4810.1 6.8815 0.005 **
## - TEMP_AFTERN_MEAN 1 4811.6 8.3572 0.005 **
## - PH_qnt_95 1 4813.1 9.8634 0.005 **
## - SALINITY_qnt_5 1 4813.1 9.8740 0.005 **
## - SALINITY_qnt_95 1 4813.3 10.0301 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + PH_qnt_5 1 4804.7 2.4785 0.005 **
## + PH_MEAN 1 4805.0 2.2333 0.005 **
## + TEMP_AFTERN_qnt_5 1 4805.4 1.8285 0.010 **
## + PH_qnt_50 1 4805.3 1.9241 0.015 *
## + SALINITY_qnt_50 1 4805.7 1.5258 0.025 *
## + TEMP_AFTERN_qnt_50 1 4805.8 1.4677 0.025 *
## + SALINITY_MEAN 1 4805.9 1.2993 0.070 .
## + TEMP_AFTERN_qnt_95 1 4806.0 1.1833 0.175
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5
##
## Df AIC F Pr(>F)
## - SALINITY_qnt_5 1 4804.7 1.9499 0.010 **
## - SALINITY_qnt_95 1 4804.8 2.0395 0.010 **
## - PH_qnt_5 1 4805.2 2.4785 0.005 **
## - PH_qnt_95 1 4807.5 4.7041 0.005 **
## - DO_EVEN_MEAN 1 4808.0 5.2164 0.005 **
## - TEMP_AFTERN_MEAN 1 4809.8 7.0638 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + PH_MEAN 1 4805.4 1.3144 0.060 .
## + PH_qnt_50 1 4805.4 1.3119 0.060 .
## + SALINITY_qnt_50 1 4805.4 1.3199 0.080 .
## + SALINITY_MEAN 1 4805.4 1.2883 0.105
## + TEMP_AFTERN_qnt_95 1 4805.6 1.1220 0.190
## + TEMP_AFTERN_qnt_5 1 4805.6 1.1511 0.195
## + TEMP_AFTERN_qnt_50 1 4805.6 1.1050 0.205
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 + PH_MEAN
##
## Df AIC F Pr(>F)
## - PH_MEAN 1 4804.7 1.3144 0.125
## - SALINITY_qnt_95 1 4804.7 1.3016 0.080 .
## - SALINITY_qnt_5 1 4804.8 1.4085 0.035 *
## - PH_qnt_5 1 4805.0 1.5591 0.035 *
## - DO_EVEN_MEAN 1 4805.1 1.6528 0.020 *
## - PH_qnt_95 1 4806.0 2.5362 0.005 **
## - TEMP_AFTERN_MEAN 1 4810.5 7.0965 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5
##
## Df AIC F Pr(>F)
## + PH_qnt_50 1 4805.4 1.3119 0.070 .
## + SALINITY_qnt_50 1 4805.4 1.3199 0.085 .
## + PH_MEAN 1 4805.4 1.3144 0.100 .
## + SALINITY_MEAN 1 4805.4 1.2883 0.110
## + TEMP_AFTERN_qnt_5 1 4805.6 1.1511 0.155
## + TEMP_AFTERN_qnt_50 1 4805.6 1.1050 0.220
## + TEMP_AFTERN_qnt_95 1 4805.6 1.1220 0.260
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 + PH_qnt_50
##
## Df AIC F Pr(>F)
## - PH_qnt_50 1 4804.7 1.3119 0.095 .
## - PH_qnt_5 1 4805.3 1.8652 0.020 *
## - SALINITY_qnt_95 1 4805.1 1.6343 0.015 *
## - SALINITY_qnt_5 1 4805.2 1.7276 0.015 *
## - DO_EVEN_MEAN 1 4805.7 2.2419 0.005 **
## - PH_qnt_95 1 4808.1 4.6475 0.005 **
## - TEMP_AFTERN_MEAN 1 4810.5 7.0067 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + SALINITY_MEAN 1 4806.3 1.1423 0.175
## + TEMP_AFTERN_qnt_50 1 4806.3 1.1423 0.190
## + PH_MEAN 1 4806.3 1.1423 0.200
## + TEMP_AFTERN_qnt_95 1 4806.3 1.1423 0.225
## + TEMP_AFTERN_qnt_5 1 4806.3 1.1423 0.235
## + SALINITY_qnt_50 1 4806.3 1.1423 0.255
##
##
## FORWARD SELECTION OF VARIABLES USING R2 value.
## Step: R2.adj= 0
## Call: allelecount ~ 1
##
## R2.adjusted
## <All variables> 0.081170092
## + SALINITY_qnt_5 0.047971864
## + SALINITY_MEAN 0.047472101
## + SALINITY_qnt_95 0.047369779
## + SALINITY_qnt_50 0.047011036
## + PH_qnt_95 0.046244810
## + TEMP_AFTERN_qnt_95 0.046163260
## + PH_MEAN 0.042567761
## + PH_qnt_50 0.034548656
## + PH_qnt_5 0.028489278
## + DO_EVEN_MEAN 0.016855715
## + TEMP_AFTERN_qnt_50 0.012371010
## + TEMP_AFTERN_MEAN 0.007461264
## + TEMP_AFTERN_qnt_5 0.004381808
## <none> 0.000000000
##
## Df AIC F Pr(>F)
## + SALINITY_qnt_5 1 4828.7 42.672 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.04797186
## Call: allelecount ~ SALINITY_qnt_5
##
## R2.adjusted
## <All variables> 0.08117009
## + PH_qnt_95 0.06484362
## + DO_EVEN_MEAN 0.05865220
## + PH_MEAN 0.05833784
## + PH_qnt_5 0.05532689
## + PH_qnt_50 0.05450967
## + TEMP_AFTERN_MEAN 0.05289066
## + TEMP_AFTERN_qnt_5 0.05278021
## + SALINITY_qnt_50 0.05260988
## + TEMP_AFTERN_qnt_95 0.05137729
## + TEMP_AFTERN_qnt_50 0.05118021
## + SALINITY_MEAN 0.04959677
## + SALINITY_qnt_95 0.04910154
## <none> 0.04797186
##
## Df AIC F Pr(>F)
## + PH_qnt_95 1 4814.9 15.902 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.06484362
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95
##
## R2.adjusted
## <All variables> 0.08117009
## + SALINITY_qnt_95 0.06856765
## + TEMP_AFTERN_MEAN 0.06763894
## + SALINITY_MEAN 0.06759221
## + PH_qnt_5 0.06754970
## + TEMP_AFTERN_qnt_5 0.06747536
## + TEMP_AFTERN_qnt_50 0.06707204
## + DO_EVEN_MEAN 0.06694364
## + PH_MEAN 0.06683160
## + SALINITY_qnt_50 0.06682361
## + TEMP_AFTERN_qnt_95 0.06664108
## + PH_qnt_50 0.06645517
## <none> 0.06484362
##
## Df AIC F Pr(>F)
## + SALINITY_qnt_95 1 4812.6 4.2985 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.06856765
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95
##
## R2.adjusted
## <All variables> 0.08117009
## + TEMP_AFTERN_MEAN 0.07242610
## + TEMP_AFTERN_qnt_5 0.07187588
## + PH_qnt_50 0.07166958
## + PH_MEAN 0.07130127
## + TEMP_AFTERN_qnt_50 0.07129398
## + PH_qnt_5 0.07107212
## + TEMP_AFTERN_qnt_95 0.07096014
## + DO_EVEN_MEAN 0.07077469
## + SALINITY_MEAN 0.07042134
## + SALINITY_qnt_50 0.06903948
## <none> 0.06856765
##
## Df AIC F Pr(>F)
## + TEMP_AFTERN_MEAN 1 4810.1 4.4276 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.0724261
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN
##
## R2.adjusted
## <All variables> 0.08117009
## + DO_EVEN_MEAN 0.07900793
## + PH_qnt_5 0.07594580
## + SALINITY_qnt_50 0.07431980
## + PH_MEAN 0.07296529
## + PH_qnt_50 0.07278029
## + SALINITY_MEAN 0.07265161
## + TEMP_AFTERN_qnt_5 0.07260285
## + TEMP_AFTERN_qnt_50 0.07245540
## <none> 0.07242610
## + TEMP_AFTERN_qnt_95 0.07242516
##
## Df AIC F Pr(>F)
## + DO_EVEN_MEAN 1 4805.2 6.8815 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.07900793
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN + DO_EVEN_MEAN
##
## R2.adjusted
## <All variables> 0.08117009
## + PH_qnt_5 0.08066151
## + PH_MEAN 0.08038765
## + PH_qnt_50 0.08004213
## + TEMP_AFTERN_qnt_5 0.07993525
## + SALINITY_qnt_50 0.07959671
## + TEMP_AFTERN_qnt_50 0.07953162
## + SALINITY_MEAN 0.07934320
## + TEMP_AFTERN_qnt_95 0.07921328
## <none> 0.07900793
##
## Df AIC F Pr(>F)
## + PH_qnt_5 1 4804.7 2.4785 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.08066151
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5
##
## R2.adjusted
## <All variables> 0.08117009
## + SALINITY_qnt_50 0.08101957
## + PH_MEAN 0.08101347
## + PH_qnt_50 0.08101061
## + SALINITY_MEAN 0.08098421
## + TEMP_AFTERN_qnt_5 0.08083069
## + TEMP_AFTERN_qnt_95 0.08079813
## + TEMP_AFTERN_qnt_50 0.08077911
## <none> 0.08066151
##
## Df AIC F Pr(>F)
## + SALINITY_qnt_50 1 4805.4 1.3199 0.076 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: R2.adj= 0.08101957
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 + TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 + SALINITY_qnt_50
##
## R2.adjusted
## <All variables> 0.08117009
## + PH_qnt_50 0.08117009
## + TEMP_AFTERN_qnt_5 0.08117009
## + TEMP_AFTERN_qnt_95 0.08117009
## + TEMP_AFTERN_qnt_50 0.08117009
## + SALINITY_MEAN 0.08117009
## + PH_MEAN 0.08117009
## <none> 0.08101957
##
## Df AIC F Pr(>F)
## + PH_qnt_50 1 4806.3 1.1343 0.196
Step-wise selection of variables selects parameters SALINITY_qnt_5, PH_qnt_95, SALINITY_qnt_95, TEMP_AFTERN_MEAN, DO_EVEN_MEAN, PH_qnt_5, and PH_qnt_50 as the best model. By contrast, forward selection of variables using R2 as the criterion results selects the combination of SALINITY_qnt_5, PH_qnt_95, SALINITY_qnt_95, TEMP_AFTERN_MEAN, DO_EVEN_MEAN, PH_qnt_5, and SALINITY_qnt_50 as best model.
Build model using parameters selected using both models PH_qnt_95, TEMP_AFTERN_MEAN, DO_EVEN_MEAN, PH_qnt_5, and SALINITY_qnt_50
Determine proportion of genetic variance explained by environmental parameters of outdoor grow-out ponds while controling for family effects by running partial RDA using allele counts as response variables, dummy variables indicating parentage as conditioning matrix, and environmental parameters (PH_qnt_95, TEMP_AFTERN_MEAN, DO_EVEN_MEAN, PH_qnt_5, and SALINITY_qnt_50; parameters in common with both selected best models) as constraining matrix.
##
## Summary of RDA results
## Call: rda(X = X, Y = Y, Z = Z, scale = FALSE)
##
## Inertia Proportion Rank
## Total 357.275899 1.000000
## Conditional 142.662440 0.399306 14
## Constrained 1.601150 0.004482 5
## Unconstrained 213.012310 0.596212 808
## Inertia is variance
## Some constraints were aliased because they were collinear (redundant)
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4 RDA5
## 0.3879 0.3599 0.3383 0.2828 0.2322
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 2.0453 1.8937 1.8458 1.8031 1.7425 1.6916 1.6507 1.6241
## (Showing 8 of 808 unconstrained eigenvalues)
R2 (0.004) needs to be adjusted based on the number of predictors, as the number of explanatory variables inflates the apparent amount of explained variance due to random correlation. The adjusted R2 value is 0.001.
Test for significance using a permutation test to generate p-value indicating whether to reject null hypothesis (environmental data does not explain genetic variation). Rows of unconstrained matrix (genetic data set) are repeatedly randomized for n permutations. Relationship is considered significant if observed relationship is stronger than the randomly permuted relationships.
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = X, Y = Y, Z = Z, scale = FALSE)
## Df Variance F Pr(>F)
## Model 5 1.601 1.2147 0.001 ***
## Residual 808 213.012
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Proportion of variance explained by the environmental predictors is Proportion column of Constrained row in summary table (equivalent to R2 of multiple regression).
The eigenvalues for the constrained axes reflect the variance explained by each canonical axis; two constraining variables were aliased, leaving two RDA axis. The environmental vectors were fit unto the ordination to obtain projections for aliased and other terms.
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 RDA5
## Eigenvalue 0.3879 0.3599 0.3383 0.2828 0.2322
## Proportion Explained 0.2423 0.2248 0.2113 0.1766 0.1450
## Cumulative Proportion 0.2423 0.4671 0.6783 0.8550 1.0000
Variance partitioning was used to compare the contribution of family and variation in the selected environmental parameters in structuring observed genomic variation. A full model, using family and environmental variables, a partial model, using family conditioned on environmental variables, and a partial model, using environmental data conditioned on family, were considered to partition the explainable variance into individual (family or environment) and shared components (family plus environment), using vegan. Significance of each component was tested using 1,000 permutations.
Partition variance to determine if geographic location or environmental data driving observed pattern(s).
Partition the explainable variance (i.e. total inertia for constrain matrix of full model): determine total, constrained, unconstrained values of inertia/proportion of total inertia (variance).
Perform RDA for all variables (spatial and environmental; full model).
## Call: rda(formula = allelecount ~ fam.mat + env.mat, scale = FALSE)
##
## Inertia Proportion Rank
## Total 357.2759 1.0000
## Constrained 144.2636 0.4038 19
## Unconstrained 213.0123 0.5962 808
## Inertia is variance
## Some constraints were aliased because they were collinear (redundant)
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 RDA9 RDA10 RDA11 RDA12 RDA13
## 39.28 28.22 16.02 14.43 8.99 8.17 6.99 6.37 4.78 3.79 2.30 1.99 0.88
## RDA14 RDA15 RDA16 RDA17 RDA18 RDA19
## 0.53 0.38 0.34 0.31 0.27 0.22
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 2.0453 1.8937 1.8458 1.8031 1.7425 1.6916 1.6507 1.6241
## (Showing 8 of 808 unconstrained eigenvalues)
Test for significance of overall model.
The overall model is significant (P = 0.001).
Partition the variation in genetic data into components accounted for by environmental and spatial variables and their combined effect using adjusted R squared to assess partitions explained by explanatory tables and their combinations.
##
## Partition of variance in RDA
##
## Call: varpart(Y = allelecount, X = ~fam.mat, ~env.mat)
##
## Explanatory tables:
## X1: ~fam.mat
## X2: ~env.mat
##
## No. of explanatory tables: 2
## Total variation (SS): 295467
## Variance: 357.28
## No. of observations: 828
##
## Partition table:
## Df R.squared Adj.R.squared Testable
## [a+b] = X1 14 0.39931 0.38896 TRUE
## [b+c] = X2 5 0.08523 0.07967 TRUE
## [a+b+c] = X1+X2 19 0.40379 0.38977 TRUE
## Individual fractions
## [a] = X1|X2 14 0.31010 TRUE
## [b] 0 0.07886 FALSE
## [c] = X2|X1 5 0.00081 TRUE
## [d] = Residuals 0.61023 FALSE
## ---
## Use function 'rda' to test significance of fractions of interest
Extract fraction explained by each component:
Test significance of all components using permuted p-values.
Table 3: Variance partitioning of variance explained by family (fam), environmental variables (env), and shared effects due to correlation of family and env.
| PARTITION | VARIANCE | PVAL |
|---|---|---|
| residuals | 0.610 | NA |
| fam+env+shared | 0.390 | 0.001 |
| fam+shared | 0.389 | 0.001 |
| fam | 0.310 | 0.001 |
| env+shared | 0.080 | 0.001 |
| shared | 0.079 | NA |
| env | 0.001 | 0.001 |
Compare clustering of individuals according to spawning event and family for component of genomic signal explained by differences in environmental conditions in the grow-out ponds after adjusting for family.
Fig 6a: pRDA biplot indicating environmental variables (arrows) and clustering of individuals by spawning event (shape) and sample point (fill).
Compare distribution of progeny by within spawning events.
Fig 6b: pRDA biplot indicating environmental variables (arrows) and clustering of individuals by spawning event (shape) and sample point (fill).
Compare distribution of progeny by dam across spawning events and sample points.
Fig 6c: pRDA indicating clustering of individuals by spawning event (shape) and sample point (fill) per dam.
Compare distribution of progeny by sire across spawning events and sample points.
Fig 6d: pRDA indicating clustering of individuals by spawning event (shape) and sample point (fill) per sire
Use the Mahalanobis distance to identify alleles with strongest associations to first two RDA axes (p < 0.001).
Table 4: Number of significantly associated loci (Mahalanobis distance >= 13.82)
| MAH_DIST >= 13.82 | n |
|---|---|
| TRUE | 33 |
Fig 7a: Distribution of loci flagged as non-associated (grey) vs outlier (red) based on Mahalanobis distance threshold of 13.82 (p < 0.001.
Compare distribution of flagged loci on genome scaffolds.
Fig 7b: Distribution of Mahalanobis distance per locus across the genome.
Table 5: Number of loci flagged as significantly correlated per scaffold.
| SCAFFOLD | p < 0.001 | p < 0.01 | p < 0.05 | total_outlier |
|---|---|---|---|---|
| scf7180000009121 | 2 | 1 | 1 | 4 |
| scf7180000009161 | 1 | 0 | 2 | 3 |
| scf7180000009255 | 1 | 1 | 1 | 3 |
| scf7180000009291 | 0 | 1 | 2 | 3 |
| scf7180000009305 | 0 | 3 | 0 | 3 |
| scf7180000009492 | 0 | 2 | 1 | 3 |
| scf7180000009535 | 0 | 0 | 3 | 3 |
| scf7180000009563 | 1 | 2 | 0 | 3 |
| scf7180000012298 | 1 | 1 | 1 | 3 |
| scf7180000012302 | 0 | 3 | 0 | 3 |
| scf7180000012359 | 0 | 0 | 3 | 3 |
| scf7180000012438 | 2 | 0 | 1 | 3 |
| scf7180000009116 | 2 | 0 | 0 | 2 |
| scf7180000009119 | 0 | 1 | 1 | 2 |
| scf7180000009177 | 2 | 0 | 0 | 2 |
| scf7180000009298 | 0 | 2 | 0 | 2 |
| scf7180000009351 | 0 | 1 | 1 | 2 |
| scf7180000009427 | 0 | 0 | 2 | 2 |
| scf7180000009429 | 1 | 0 | 1 | 2 |
| scf7180000009565 | 0 | 0 | 2 | 2 |
| scf7180000009566 | 0 | 0 | 2 | 2 |
| scf7180000009702 | 0 | 1 | 1 | 2 |
| scf7180000009895 | 1 | 0 | 1 | 2 |
| scf7180000010112 | 1 | 1 | 0 | 2 |
| scf7180000012324 | 0 | 1 | 1 | 2 |
| scf7180000012327 | 1 | 1 | 0 | 2 |
| scf7180000012435 | 1 | 0 | 1 | 2 |
| scf7180000012971 | 0 | 1 | 1 | 2 |
Compare clustering of flagged loci on scaffolds with four flagged loci.
Fig 7c: Distribution of Mahalanobis distance per locus for scaffolds with 4 flagged outlier loci. Loci above red dashed line are flagged as significantly associated.
Compare clustering of flagged loci on scaffolds with three flagged loci.
Fig 7d: Distribution of Mahalanobis distance per locus for scaffolds with three flagged outlier loci. Loci above red dashed line are flagged as significantly associated.
Compare clustering of flagged loci on scaffolds with two flagged loci.
Fig 7e: Distribution of Mahalanobis distance per locus for scaffolds with two flagged outlier loci. Loci above red dashed line are flagged as significantly associated.
Compare significant outlier from the RDA to a preliminary draft of the red drum genome to identify genes potentially affected by selection. For each flagged locus, extract all genes with boundaries falling within 50kb using the annotated red drum scaffolds.
out_tbl <- read_tsv("results/pRDA_sign_loci_model3.rda") %>%
extract(LOCUS, into = c("locus", "pos"), regex = "(.*_pilon)_(\\d+)")
# Make a list of the unique scaffolds
contigs_uniq <- out_tbl %>%
pull(locus) %>%
unique()
# Write to a file
contigs_uniq %>%
write_lines("results/drum_contigs_uniq.txt")
Extract the relevant scaffolds from a gff files of the genome.
# get annotation information for the matchin scaffolds
zgrep -f results/drum_contigs_uniq.txt data/REF/pyu_contig3_snap2.maker.output_new.0.6.AED_annotated_blast.gff.gz > data/REF/drum_matching_scaffolds.gff
Import annotation data for scaffolds with flagged loci and identify genes withing 50kb of flagged loci.
anno_raw <- read_tsv("data/REF/drum_matching_scaffolds.gff", col_names = FALSE) %>%
filter(X3 == "gene") %>%
select(scaf = X1, start = X4, end = X5, id = X9)
anno_mod <- anno_raw %>%
extract(id, "gene_id", "ID=(Soce_v1_\\d+)|") %>%
extract(scaf, "scaffold", "lcl\\|(scf.*)") %>%
mutate(gene_id = paste(gene_id, "-RA", sep = ""))
out_genes <- out_tbl %>%
mutate(pos = as.integer(pos)) %>%
mutate(left = pos - 50000, right = pos + 50000) %>%
group_by(locus, pos) %>%
group_split() %>%
map(function(x) {
#browser()
genes <- anno_mod %>%
filter(scaffold == x$locus) %>%
filter((start > x$left & start < x$right) | (end > x$left & end < x$right)) %>%
mutate(out_pos = x$pos, out_mah = x$mah_dist)
return(genes)
}) %>%
bind_rows() %>%
distinct(gene_id, .keep_all = TRUE)
out_info <- out_genes %>%
select(gene_id, out_pos, out_mah)
res_tbl <- read_tsv("data/REF/output.blastp", col_names = FALSE) %>%
extract(X1, "gene_id", "^(Soce_v1_\\d+-RA)") %>%
filter(gene_id %in% out_genes$gene_id) %>%
left_join(out_genes, by = "gene_id") %>%
select(gene_id, desc = X13, start, end, out_pos, out_mah, scaffold) %>%
arrange(desc(out_mah), desc(start))
#filtered_gene_list <- res_tbl %>%
# filter(out_pos >= start, out_pos <= end)
# Get the gene closest to each outlier
top_hits <- res_tbl %>%
group_by(scaffold, out_pos) %>%
mutate(direction = case_when(out_pos < start ~ "left",
out_pos >= start & out_pos <= end ~ "within",
out_pos > end ~ "right"),
distance = case_when(direction == "left" ~ start - out_pos,
direction == "within" ~ 0,
direction == "right" ~ out_pos - end)) %>%
top_n(-1, distance) %>%
select(scaffold, out_pos, desc)
summary_tbl <- res_tbl %>%
group_by(scaffold, out_pos) %>%
summarize(num_genes_100k = n(), mah_dist = out_mah[1]) %>%
arrange(desc(mah_dist)) %>%
left_join(top_hits, by = c("scaffold", "out_pos")) %>%
rename(nearest_gene = desc)
# Note that the output has 34 rows for 33 outliers because one outlier has 2 genes that are equally distant
write_csv(res_tbl, "results/red_drum_env_genes_tbl.csv")
write_csv(summary_tbl, "results/red_drum_env_genes_summary_tbl.csv")
A total of 33 loci were flagged as significant in the RDA analysis were used to identify potential genes under selection in the red drum genome. One hundred fifty six genes were identified in 100kb windows flanking each outlier locus. A full list of the genes identified and a summary table including the outlier location.
Table 6: Number of genes with in 50kb of flagged locus and summary of gene nearest to each flagged locus.
| scaffold | out_pos | num_genes_100k | mah_dist | nearest_gene |
|---|---|---|---|---|
| scf7180000010096_pilon | 120999 | 6 | 66.10 | PREDICTED: solute carrier family 12 member 9-like [Larimichthys crocea] |
| scf7180000009434_pilon | 677210 | 5 | 61.82 | PREDICTED: membrane-associated guanylate kinase, WW and PDZ domain-containing protein 3 [Stegastes partitus] |
| scf7180000009140_pilon | 2779714 | 3 | 45.61 | PREDICTED: methylated-DNA–protein-cysteine methyltransferase isoform X1 [Lates calcarifer] |
| scf7180000009116_pilon | 6339640 | 3 | 34.49 | PREDICTED: A disintegrin and metalloproteinase with thrombospondin motifs 7 [Larimichthys crocea] |
| scf7180000012438_pilon | 897449 | 14 | 34.46 | 60S ribosomal protein L23a [Amphiprion ocellaris] |
| scf7180000012313_pilon | 151064 | 10 | 33.55 | methyl-CpG-binding domain protein 6-like [Seriola lalandi dorsalis] |
| scf7180000012436_pilon | 143986 | 5 | 29.20 | PREDICTED: uncharacterized protein LOC104931461 isoform X1 [Larimichthys crocea] |
| scf7180000009563_pilon | 821390 | 4 | 28.48 | PREDICTED: nuclear factor of activated T-cells, cytoplasmic 2 isoform X2 [Larimichthys crocea] |
| scf7180000013177_pilon | 526531 | 3 | 26.29 | PREDICTED: junctional adhesion molecule A-like [Larimichthys crocea] |
| scf7180000012327_pilon | 949860 | 1 | 25.93 | PREDICTED: zinc finger protein 236 isoform X2 [Larimichthys crocea] |
| scf7180000009121_pilon | 14240 | 5 | 25.75 | PREDICTED: collagen alpha-3(IX) chain [Larimichthys crocea] |
| scf7180000009895_pilon | 473728 | 9 | 25.26 | uncharacterized protein LOC111858574 isoform X1 [Paramormyrops kingsleyae] |
| scf7180000009418_pilon | 148815 | 3 | 24.66 | spectrin beta chain, non-erythrocytic 5 [Seriola dumerili] |
| scf7180000009631_pilon | 839556 | 1 | 23.51 | PREDICTED: slit homolog 3 protein, partial [Larimichthys crocea] |
| scf7180000012298_pilon | 1836218 | 1 | 22.03 | PREDICTED: neurexophilin-1 [Larimichthys crocea] |
| scf7180000013182_pilon | 1081449 | 5 | 20.14 | PREDICTED: rap1 GTPase-activating protein 1-like isoform X5 [Larimichthys crocea] |
| scf7180000009121_pilon | 3257984 | 6 | 19.62 | PREDICTED: gamma-glutamyltransferase 7 [Larimichthys crocea] |
| scf7180000009177_pilon | 2488072 | 1 | 19.45 | PREDICTED: collagen alpha-6(IV) chain [Larimichthys crocea] |
| scf7180000013165_pilon | 666966 | 5 | 18.58 | PREDICTED: GDNF family receptor alpha-2-like [Larimichthys crocea] |
| scf7180000009116_pilon | 3039707 | 2 | 18.43 | uncharacterized protein LOC110962793 [Acanthochromis polyacanthus] |
| scf7180000009599_pilon | 147918 | 7 | 18.12 | PREDICTED: mitochondrial Rho GTPase 1-A-like [Larimichthys crocea] |
| scf7180000012435_pilon | 1196576 | 1 | 17.21 | PREDICTED: glutamate receptor ionotropic, kainate 2-like [Notothenia coriiceps] |
| scf7180000009733_pilon | 28954 | 3 | 17.10 | PREDICTED: death domain-containing protein 1 [Larimichthys crocea] |
| scf7180000009177_pilon | 1944675 | 7 | 15.83 | uncharacterized protein LOC111661711 [Seriola lalandi dorsalis] |
| scf7180000012438_pilon | 2323232 | 5 | 15.54 | PREDICTED: protein YIPF5 [Haplochromis burtoni] |
| scf7180000009161_pilon | 5008415 | 12 | 15.41 | PREDICTED: transmembrane protein 100-like [Larimichthys crocea] |
| scf7180000009598_pilon | 1739088 | 5 | 15.25 | PREDICTED: microtubule-associated protein RP/EB family member 2 isoform X1 [Larimichthys crocea] |
| scf7180000010112_pilon | 10613 | 3 | 15.13 | PREDICTED: uncharacterized protein LOC106013965 [Aplysia californica] |
| scf7180000009115_pilon | 4098131 | 3 | 14.48 | PREDICTED: tyrosine-protein kinase SgK223 [Larimichthys crocea] |
| scf7180000009255_pilon | 854075 | 8 | 14.45 | PREDICTED: protein DENND6A [Larimichthys crocea] |
| scf7180000009429_pilon | 429853 | 3 | 14.42 | mitotic-spindle organizing protein 2-like isoform X1 [Labrus bergylta] |
| scf7180000009429_pilon | 429853 | 3 | 14.42 | PREDICTED: DNA-directed DNA/RNA polymerase mu [Larimichthys crocea] |
| scf7180000009279_pilon | 1366461 | 6 | 14.24 | PREDICTED: probable G-protein coupled receptor 153 isoform X1 [Larimichthys crocea] |
| scf7180000009860_pilon | 1038811 | 1 | 14.16 | PREDICTED: uncharacterized protein LOC106526999 [Austrofundulus limnaeus] |
Top genes candidates (ranked; mahalanobis distance > 30)
Forester, Brenna R., Matthew R. Jones, Stéphane Joost, Erin L. Landguth, and Jesse R. Lasky. 2016. “Detecting spatial genetic signatures of local adaptation in heterogeneous landscapes.” Molecular Ecology 25 (1): 104–20. https://doi.org/10.1111/mec.13476.
Forester, Brenna R., Jesse R. Lasky, Helene H. Wagner, and Dean L. Urban. 2018. “Comparing methods for detecting multilocus adaptation with multivariate genotype–environment associations.” Molecular Ecology 27 (9): 2215–33. https://doi.org/10.1111/mec.14584.
Laget, Sophie, Michael Joulie, Florent Le Masson, Nobuhiro Sasai, Elisabeth Christians, Sriharsa Pradhan, Richard J Roberts, and Pierre Antoine Defossez. 2010. “The human proteins MBD5 and MBD6 associate with heterochromatin but they do not bind methylated DNA.” PLoS ONE 5 (8): e11982. https://doi.org/10.1371/journal.pone.0011982.
Liu, Chuan-Ju, Wei Kong, Kiril Ilalov, Shuang Yu, Ke Xu, Lisa Prazak, Marc Fajardo, et al. 2006. “ADAMTS‐7: a metalloproteinase that directly binds to and degrades cartilage oligomeric matrix protein.” The FASEB Journal 20 (7): 988–90. https://doi.org/10.1096/fj.05-3877fje.
Oksanen, Jari, Roeland Kindt, Pierre Legendre, Bob O Hara, Gavin L Simpson, Petery Solymos, M Henry H Stevens, and Helene Wagner. 2010. “vegan: Community Ecology Package. R package version 1.17-3.” October 10 (01): 2008. http:/.
Verri, Tiziano, Genciana Terova, Alessandro Romano, Amilcare Barca, Paola Pisani, Carlo Storelli, and Marco Saroglia. 2012. “The SoLute Carrier (SLC) Family Series in Teleost Fish.” In Functional Genomics in Aquaculture, 219–320. Oxford, UK: Wiley-Blackwell. https://doi.org/10.1002/9781118350041.ch10.
Walter, Ronald B., Huang Mo Sung, Gabe W Intano, and Christi A Walter. 2001. “Characterization of O6-methylguanine-DNA-methyltransferase (O6-MGMT) activity in Xiphophorus fishes.” Mutation Research - Genetic Toxicology and Environmental Mutagenesis 493 (1-2): 11–22. https://doi.org/10.1016/S1383-5718(01)00169-3.
Wright, Gavin J, Jonathan D Leslie, Linda Ariza-McNaughton, and Julian Lewis. 2004. “Delta proteins and MAGI proteins: An interaction of Notch ligands with intracellular scaffolding molecules and its significance for zebrafish development.” Development 131 (22): 5659–69. https://doi.org/10.1242/dev.01417.